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ABSTRACT 

We describe Ganalyzer, a model-based tool that can automatically analyze and 
classify galaxy images. Ganalyzer works by separating the galaxy pixels from the 
background pixels, finding the center and radius of the galaxy, generating the radial 
intensity plot, and then computing the slopes of the peaks detected in the radial 
intensity plot to measure the spirality of the galaxy and determine its morphological 
class. Unlike algorithms that are based on machine learning, Ganalyzer is based on 
measuring the spirality of the galaxy, a task that is difficult to perform manually, and 
in many cases can provide a more accurate analysis compared to manual observation. 
Ganalyzer is simple to use, and can be easily embedded into other image analysis appli- 
cations. Another advantage is its speed, which allows it to analyze ^10,000,000 galaxy 
images in five days using a standard modern desktop computer. These capabilities can 
make Ganalyzer a useful tool in analyzing large datasets of galaxy images collected 
by autonomous sky surveys such as SDSS, LSST or DES. The software is avail- 
able for free download at http://vfacstaff.ltu.edu/lshamir/downloads/ganalyzer, 
and the data used in the experiment is available at 
http:/ /vfacstaff. ltu.edu/lshamir/downloads/ganalyzer/GalaxyImages. zip. 

Subject headings: methods: data analysis - techniques: image processing - surveys - 
Galaxy: general 



1. Introduction 



Robotic telescopes that acquire large datasets of astronomical images have introduced the 
need for methods and tools that can automatically analyze astronomical images and turn these 
data into knowledge. One of the tasks that require automation is the m orphological anal ysis 
of galaxy images, which is an essent ial tool in sk y surveys such as SDSS (lYork et all l200d ) or 
the future LSST dTvsonl bood l . DES d AbbotI bood l . and the space-based TAUVEX galaxy survey 
( Brosch fc AlmozniiiQ[2007l ). 

^The first attempts to automatic ally classify galaxies were m ade by iMorgan "fc Ma,va,lll (Il95i 



19691 ) ■ and followed by the work of iKormendy Bendeii (119961 ). who tried to classify elliptical 
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galaxies by their internal structures. Other studies used central concentration as an indicator that 
can determine the posit i on of a galaxy on the H ubble sequence (jPoi. Fukugita fc Qkamura Ill993l : 



Brinchmann et 



al 



19981: IShiniasaku et al.lll998l ). or the central concentration and asymmetry of 



galaxian light ( Abraham et alJ 19961 ). A nother approach t o galaxy image analysis is the parametric 



approach, used by tools such as GIM2D (iPeng et al.ll2002l ) and GALF IT (ISimardlll998l V which can 
be wrapped by the GALAPAGOS script to improve its performance f Hausslerl 12007 ). 



Later attempts to perform automatic morphological clas sification of galaxies includ e the Gini 
coeffi cient method ( Abraham. Van Den Bergh fc Nair 20031 ) . and the CAS method ( Conselice 
2003 ) . However , the efficacy o f these methods for real-life galaxy morphological classification has 



been criticized (Thorsten 



20081) . and did n ot provide solid useful tools that can be used for galaxy 



morphological analysis (ILintott et al.ll2008l ). This led to the contenti on that practical cla ssification 
of large datasets of galaxy images should be carried out by humans (ILintott et al.ll2008l ). 



A significant improvement wa s introduced by the application of ma chine learning approaches 
to the task of galaxy classification. iHuertas-Company et al.l (120081 . l2009l ) used Support Vector Ma- 
chine for galaxy classification and pr o bability density estima tion, and applied the method to SDSS 



DR7 (IHuertas-Company et al.l 120111 ). 



Ball et al 



20081 ) achieved good results by utilizing an 



Artificial Neural Network. Recent studies also showed significant improvement in the accuracy of 
automatic classification of galaxy images used by the Galaxy Zoo project, demonstrating accuracy 
of ^90% for the classification of the three primary morphological types (spiral, ellipt i cal, and edge- 
on) , and ^95% accuracy when classifying spiral and elliptical galaxies (.Shamirll2009l : iBanerji et al 



2010l ). While showing good classification accuracy, these machine learning methods require a step 
of training, and normally do not provide useful information about the galaxy other than its mor- 
phological type. Here we describe Ganalyzer, which is a fast and easy to use model-based tool 
that measures the ellipticity and spirality of galaxies. In Section [2] the image analysis method is 
described. Section [3] discusses the performance evaluation and experimental results, and Section [H 
provides an introduction to using the Ganalyzer command line utility. 



2. Morphological analysis method 

2.1. Finding the galaxy center, ellipticity, and position angle 

The first step of Ganalyzer is detecting the objects in the image and extracting basic infor- 
mation about each object such as th e center, ellipticity and position angle, as done by object 



detection methods such as SExtractor (iBertin fc Arnoutslll996l ). This goal is achieved by first sep- 
arating the objects from the background by applying the Otsu threshold which is a 
widely used method for determining the graylevel threshold that separates between the foreground 
and the background pixels. Figure [T] shows an original galaxy image taken from Galaxy Zoo and 
the foreground galaxy pixels detected using the Otsu method. 
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Once foreground pixels are separated from the background pixels, all eight-connected objects 
that their surface size is larger than 1000 pixels are detected. Detecting objects in the image allows 
ganalyzer to analyze images in which more than one object is present, and the 1000 threshold is 
used to reject small foreground objects (surface size smaller than 1000 pixels) that are present in 
the image but are too small to provide an interpretable morphological structure. 

After the objects are detected, each object is assigned with its pixel mass center, computed as 
the image coordinates {v, w ) such that the num ber of pixels (x, y) where x < v equals the number 



of pixels (x, y) where x > v (IShamir et al.ll2008l ). Similarly, the w coordinate is computed such that 
the number of pixels where y < w equals the number of pixels such that y > w. Then, the galaxy 
center {Ox, Oy) is determined as the center of the 5x5 shifted window that has the highest median 

value and its distance from the pixel mass center of the object is smaller than 0.1/ where S is 
the surface size of the object in pixels. This simple and fast method accurately detected the center 
of the galaxy in all 525 galaxy images tested in this study. 

After the galaxy center is found, the radius of the galaxy is determined by the maximal distance 
between the object center and any foreground pixel. The major axis of the galaxy is determined as 
the longest possible line that passes through the center, and the minor axis is determined by the 
length of the line that passes through the center in 90^ to the major axis. The ellipticity of the 
galaxy is then determined by the minor axis of the galaxy divided by its major axis. In addition 
to the ellipticity, ganalyzer also computes the position angle of the galaxy. 



2.2. Detecting spirality 

The basic element used in this study for measuring spirality is the radial intensity plot. The 
radial intensity plot is a 360x35 image, such that the value of the pixel {x^y) is the median value 
of the 5x5 windows around the pixel at image coordinates {Ox + sin{6) • r^Oy — cos{6) • r) in 
the galaxy image, where 6 is the polar angle (in degrees) and r is a radial distance. Intuitively, 
the radial intensity plot is an image of the radial intensities at different distances from the galaxy 
center. Each horizontal line in the radial intensity plot is then smoothed using a median filter with 
a span of 50 pixels. If the radius of the galaxy is greater than 100 pixels, the radial intensity plot 
is computed after downscaling the object such that the radius is set to 100. This practice helps 
to analyze high-resolution images in which star forming regions or substructures in the spirals can 
affect the detection of the arms. 

Figure [2] shows the original galaxy image and a transformation of the radial intensity plot such 
that the Y axis is the intensity and the X axis is the polar angle. As the figure shows, in an image 
of a spiral galaxy the peaks are expected to shift due to the spirality of the arms, while if the galaxy 
is not spiral the peaks are expected to align in a near-straight line. 

To use the shift of the peaks in the radial intensity plot for detecting spirality, the peaks are first 
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Fig. 2. — Galaxy images (left) and the transformation of the radial intensity plots such that the Y 
axis is the intensity and the X axis is the polar angle 
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detected using an automatic peak detection algorithm (IMorhac et al.ll2000l). with the parameters 



(7=10, threshold— {).{)^^ and iterations— 1^ as described in (iMorhac et al.ll2000l ). Figure [3] shows the 



peaks detected in the galaxy images of Figure El such that the radial distance is between 0.4r to 
0.75r, where r is the radius of the galaxy described in Section [2?T1 

Once the peaks are detected, a linear regression is used to determine the slope of each of the 
two groups of peaks that have the highest number of detected peaks. For instance, in the galaxies 
of FigureOthe peaks of the spiral galaxy are organized in two lines with slopes of ^0.74 and ^0.81, 
while the slopes of the peaks of the elliptical galaxy are ^0.22 and ^0.1. The slopes of the arm 
reflect the level of spirality of the examined galaxy. To avoid the effect of local variations that 
can lead to peaks such as stars or satellite galaxies, only groups that have 20 or more peaks are 
included in the analysis. 

If just one arm is detected, a galaxy is considered spiral if the absolute value of the slope 
of the arm is greater than 0.35. If more than one arm is detected, the analysis is based on the 
two arms with the largest number of peaks. If both arms have slopes greater than 0.5, or if one 
of the arms has a slope greater than 0.7, then the galaxy is considered by Ganalyzer as spiral. 
If the standard deviation of one of the arms is smaller than 2, then a slope greater than 0.35 in 
this arm will also be considered by Ganalyzer as an indication of a spiral galaxy. These rules 
were determined experimentally by comparing the analysis of the slopes to t he manual g alaxy 



classification performed by the author using the galaxy image datasets used in (IShamirll2009l ). 



The slope of the arm of a spiral galaxy can peak at different distances from the center in 
different galaxies. Therefore, the peaks are detected in four different ranges of distances from the 
center: O.lr to 0.45r, 0.2r to 0.55r, 0.3r to 0.65r, and 0.4r to 0.75r, such that r is the radius of the 
galaxy described in Section [2Tl If the slopes of the peaks detected in any of these ranges meet the 
criteria described above, then the galaxy is determined to be spiral. 

Ganalyzer also detects the presence of bars, which is performed by analyzing the vertical lines 
in the radial intensity plot generated for distances 0.5r to l.Or. While the intensity is normally 
expected to decrease when moving away from the galaxy center, if bars exist it is expected that 
the intensities will increase at around the distance of the bar from the center. Therefore, if 50% 
or more of the vertical lines of the radial intensity plot show an intensity increase the galaxy is 
determined to have bars. 

If no spirality is detected, the galaxy is determined to be edge-on if the ellipticity described 
in Section 12.11 is greater than 0.8. Otherwise, the galaxy is considered elliptical. It should be 
noted that Ganalyzer outputs the ellipticity value, which can be more informative than the distinct 
morphological class. 
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3. Experimental Results 

Ganalyzer was tested using a dataset of small galaxy images taken from the Galaxy Zoo web site 



(ILintott et al.ll2008l ) , a nd was previo usly used for developing a machine learning-based galaxy image 



classification method (IShamirll2009l ). The first dataset contains 225 images classified manually by 
the author as spiral, 225 images classified as elliptical, and 75 galaxy images classified as edge-on 
galaxies. All images were color images, and did not contain Gal axy Zoo monochrome images that 



were collected for the Galaxy Zoo bias study (ILintott et al.ll2008l ). The dataset is available for free 



download at http : / / vfacstaff.ltu.edu /Ishamir / downloads / ganalyzer / Galaxylmages.zip , 

Among the 525 galaxy images, 466 were classified correctly, which is ^89% of accuracy where 
the gold standard is the manual classification performed by the author. Table [1] shows the confusion 
matrix of the classification. 

While the classification accuracy of ^89% is less than perfect, it should be noted that manual 
classification is subjective, and might also not provide a fully reliable "gold standard" due to the 
many in-between cases. For instance, the galaxy in Figure [4] was classified manually as edge-on, 
but Ganalyzer classified it as elliptical. In this case the galaxy seemed to the person classifying it 
long and narrow enough to be classified as an edge-on galaxy, while Ganalyzer assigned it with a 
relatively high ellipticity value of ^0.62 but classified it as elliptical. 

While in some cases disagreements between Ganalyzer and manual classification can be due 
to in-between cases, in other cases Ganalyzer can detect features that are difficult to notice with 
casual observation of a galaxy image using the unaided eye. Table [2] shows galaxy images that were 
classified manually as elliptical galaxies, but Ganalyzer classified them as Spiral. 

As the table shows, the radial intensity plots of these galaxies indicate that some of the peaks 
shift as the distance from the center changes, which might indicate that these galaxies feature 
spirality. This spirality might be difficult to detect using the unaided eye, but can be detected more 
easily by Ganalyzer using the radial intensity plots. As Table [1] shows, most of the disagreements 
between the manual classification and Ganalyzer were in galaxies that were classified manually 
as elliptical while Ganalyzer classified them as spiral. However, in some cases galaxies that were 
classified manually as spiral were classified by Ganalyzer as elliptical. 

Table [3] shows spiral galaxies that were classified incorrectly. As the table shows, these galaxies 
were clearly classified incorrectly by Ganalyzer. While the radial intensity plots show that some 
of the peaks shift as the distance from the galaxy center changes, in some cases the peaks are 
not a lways detected correctly, and improving the peak detection used in this study (^ Morhac et al 



20001 ) might improve the performance of the galaxy classification. Another limitation of Ganalyzer 
is that the analysis is dependent on the arms, and therefore if the resolution of the image is too 
low and the arms cannot be seen the galaxy might not be analyzed correctly. 

To test ganalyzer with a larger set of galaxy images, another experiment was performed with 
a galaxy dataset of 6209 galaxy images classified as spiral and 2316 galaxy images classified as 



Fig. 3. — The peaks detected in the radial intensity plots of the elliptical (up) and spiral galaxies 
of Figure [2] 



Table 1: Confusion matrix of the galaxy classification 
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Spiral 


206 


19 





Elliptical 


34 


191 





Edge-on 


3 


3 


69 




Fig. 4. — A galaxy image that was classified manually as edge-on and as elliptical by Ganalyzer 
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Table 2: Galaxy images that were classified as elliptical manually and as spiral by Ganalyzer. 
Galaxy Zoo ID Image Radial Intensity Plot 



587736945741201714 



587737810113396966 



587733397576745343 



587735348010156087 



587737826749841683 



587742189916717258 



587742551753883805 



588007004703621329 



588010360157831362 
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elliptical by Galaxy Zoo participants (ILintott et al.ll201ll ). The experiment showed that ganalyzer 



was in agreement with the manual classification in ^86% of the cases. 

An important advantage of Ganalyzer is its low computational complexity, which allows it to 
process very many images using relatively modest computing resources. For instance, the galaxy 
image dataset of 525 images used in this study was processed in ^170 seconds using a single core 
of Intel core-i7 quad core processor. Therefore, by using eight cores a standard desktop computer 
can process ^10,000,000 images in just five days. 



4. Using Ganalyzer 

The Ganalyzer tool is a simple Windows command-line utility that receives a path to a galaxy 
image file, and prints the analysis results to the standard output. For instance, the following com- 
mand line returns the morphological class, as well as the ellipticity, position angle, surface size 
(pixels), radius, image coordinates of the center, and the slopes of the shifts of the peaks detected 
in the image "galaxy.tif ' . 



C:\> ganalyzer c:\path\to\galaxy.tif 

For instance, the output of Ganalyzer when applied on the spiral galaxy of Figure [2] is: 

Object 1: 

Center: (53,62) 

Surface size (pixels): 3989 

Radius (pixels): 45 

Ellipticity: 0.295 

Position angle: 143 degrees 

slopes: 0.74 (stderr 1.28) 0.81 (stderr 0.80) 

Galaxy type: Spiral 

Currently, the supported file formats are TIFF, JPG, PPM, and BMP. In cases where the 
source images are in the FITS format, the images can be converted to lossless 8 or 16 bit TIFF 
format before being analyzed by Ganalyzer. Since Ganalyzer is used as a command line utility, 
it can be easily embedded into other applications and serve as a component in an astronomical 
pipeline processing system. 

To allow a more informative analysis of the galaxy image, Ganalyzer can also output the radial 
intensity plots and the peaks. This can be done by using the "-i" switch. For instance, the following 
command line can be used to generate the radial intensity plot and its transformation described in 
Figure [21 as well as the detected peaks as described in Figure [3l 
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C:\> ganalyzer -i c:\path\to\galaxy.tif 

When the "i" switch is used, these images are created in the working directory, such that 
irp.tiff and irp_radial.tiff are the radial intensity plot and the transformation described in Figure El 
and irp.peaks.tiff is the image of the detected peaks as described in Figure [3l 

5. Conclusion 

The increasing availability of robotic telescopes that acquire large datasets of galaxy images 
has introduced the need for automatic methods for galaxy image analysis that can be practically 
used for analyzing these datasets. Ganalyzer is a fast and simple software tool that uses the radial 
intensity plots of galaxy images to measure the spirality and ellipticity of galaxies and classify 
galaxy images into three morphological classes of spiral, elliptical, and edge-on. 

Ganalyzer is based on measuring the spirality of galaxies, and might not be optimal for detect- 
ing morphological features that are not directly related to the ellipticity and spirality. Therefore, 
ganalyzer might not excel in detecting galaxies that their unique morphology is not based on spi- 
rality such as SO, mergers, or peculiar galaxies. 

The tool is used as a command-line utility, so that it can be embedded into other pro- 
grams and serve as a component in a more comprehensive system of astronomical image pipeline 
processing. Since Ganalyzer is relatively quick, it can be practically used for analyzing very 
large datasets containing millions of galaxy images. Ganalyzer can be downloaded freely at: 
http://vfacstaff.ltu. edu /lshamir/downloads/ganalyzer^ or from the Astrophysics Source Code Li- 
brary at http://ascl.net. 
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